{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial: Exporting BigQuery ML models to AI Platform for online prediction\n",
    "This notebook walks you through the batch training to online inference machine learning pattern; it focuses on the use case where structured data is already in or can be read from BigQuery and predictions must be served in near real time. The intended audience are machine learning engineers and architects with at least a basic understanding of production machine learning deployment concepts."
    "BigQuery ML provides a user-friendly interface for creating ML models from any size datasets and making batch predictions at scale. You can export BigQuery ML models in the TensorFlow SavedModel to make online predictions. This tutorial covers the required steps in three parts:\n",
    "1. Creating a model in BigQuery ML\n",
    "2. Extracting the model from BigQuery ML and loading the model onto AI Platform\n",
    "3. Running online prediction using the Python REST client"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PART 1: Creating a model in BigQuery ML"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Set variables for this part of the tutorial"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "PROJECT_ID=[] #enter your project name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gcloud config set project $PROJECT_ID"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create a dataset to store artifacts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!bq show natality || bq mk natality"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!bq ls -a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!bq show --format=pretty natality"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Load Magic. This allows you to access BigQuery from this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext google.cloud.bigquery"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Review a sample of the data this tutorial uses."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "    SELECT\n",
    "      weight_pounds,\n",
    "      is_male,\n",
    "      gestation_weeks,\n",
    "      mother_age\n",
    "    FROM\n",
    "      `bigquery-public-data.samples.natality`\n",
    "    WHERE\n",
    "      weight_pounds IS NOT NULL\n",
    "    LIMIT 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Split data\n",
    "The following query creates a view that adds a computed column. This is used for splitting the data into three parts.\n"
    "Training is comprised of the largest part and is used by BigQuery ML to train the model.\n"  
    "Internally, BigQuery ML splits this into training and validation. Evaluation is the holdout set to test the model performance.\n " 
    "After the model is extracted from BigQuery ML, prediction is used by and hosted in AI Platform Prediction.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "    CREATE OR REPLACE VIEW\n",
    "      `natality.input_view` AS\n",
    "    SELECT\n",
    "      weight_pounds,\n",
    "      is_male,\n",
    "      gestation_weeks,\n",
    "      mother_age,\n",
    "      CASE\n",
    "        WHEN MOD(CAST(ROUND(weight_pounds*100) as int64), 10) < 8 THEN 'training'\n",
    "        WHEN MOD(CAST(ROUND(weight_pounds*100) as int64), 10) = 8 THEN 'evaluation'\n",
    "        WHEN MOD(CAST(ROUND(weight_pounds*100) as int64), 10) = 9 THEN 'prediction'\n",
    "      END AS datasplit\n",
    "    FROM\n",
    "      `bigquery-public-data.samples.natality`\n",
    "    WHERE\n",
    "      weight_pounds IS NOT NULL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "    SELECT *\n",
    "    FROM\n",
    "      `natality.input_view`\n",
    "    LIMIT 100"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Train a linear regression model in BigQuery ML"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "    CREATE OR REPLACE MODEL\n",
    "      `natality.natality_model`\n",
    "    OPTIONS\n",
    "      (model_type='linear_reg',\n",
    "        input_label_cols=['weight_pounds']) AS\n",
    "            SELECT\n",
    "              weight_pounds,\n",
    "              is_male,\n",
    "              gestation_weeks,\n",
    "              mother_age\n",
    "            FROM\n",
    "              `natality.input_view`\n",
    "            WHERE\n",
    "              datasplit = 'training'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "List models with the bq ls -m option."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!bq ls -m --format=pretty natality"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get details about natality_model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!bq show -m --format=pretty natality.natality_model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Run ML.EVALUATE against the \"evaluation\" set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "    SELECT\n",
    "      *\n",
    "    FROM\n",
    "      ML.EVALUATE(MODEL `natality.natality_model`,\n",
    "        ( SELECT\n",
    "          weight_pounds,\n",
    "          is_male,\n",
    "          gestation_weeks,\n",
    "          mother_age\n",
    "          FROM\n",
    "            `natality.input_view`\n",
    "          WHERE\n",
    "            datasplit = 'evaluation'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Run batch prediction with ML.PREDICT against the \"prediction\" set\n",
    "Use AI Platform for online prediction. Notice that this prediction is done in BigQuery ML."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "    SELECT\n",
    "      predicted_weight_pounds\n",
    "    FROM\n",
    "      ML.PREDICT(MODEL `natality.natality_model`,\n",
    "        (\n",
    "        SELECT\n",
    "          is_male,\n",
    "          gestation_weeks,\n",
    "          mother_age\n",
    "        FROM\n",
    "          `natality.input_view`\n",
    "        WHERE\n",
    "          datasplit = 'prediction'\n",
    "        LIMIT 100))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PART 2: Extract model from BQML and load model onto AI Platform"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Set variables for this part of the tutorial\n",
    "AI Platform Prediction supports a model/version struture where one model can have multiple versions. The BigQuery ML model structure does not include versions.  In this part of the tutorial, the BigQuery ML model is extracted into a version directory in Cloud Storage.  Later, AI Platform picks this up as a version of a model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import random\n",
    "\n",
    "REGION=[] #e.g. 'us-central1'\n",
    "MODEL_VERSION=[] #e.g. 'v1'\n",
    "\n",
    "MODEL_BUCKET='gs://{}-{}'.format(PROJECT_ID,str(random.randrange(1000,10000)))\n",
    "MODEL_PATH=os.path.join(MODEL_BUCKET,'export/natality_model',MODEL_VERSION)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create Cloud Storage bucket to store the extracted SavedModel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gsutil mb $MODEL_BUCKET"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "stream = os.popen(\"bq extract -m natality.natality_model {}\".format(MODEL_PATH))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stream.read()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!bq ls -m natality"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Extract model via bq extract"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!bq extract -m natality.natality_model $MODEL_PATH"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Use saved_model_cli to get details of the extracted model\n",
    "The saved_model_cli output shows the expected input tensors to the model.  This will become important in structuring the prediction request packets.  As seen below, the model will expect requests in a format of:\n",
    "{'is_male': [DT_BOOL],\n",
    "    'gestation_weeks': [DT_FLOAT],\n",
    "    'mother_age': [DT_FLOAT],\n",
    "    'mother_race': [DT_STRING]}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!saved_model_cli show --dir $MODEL_PATH --tag_set serve --signature_def serving_default"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create Model in AI Platform Prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gcloud ai-platform models create natality_model --regions=$REGION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create a version in the model in AI Platform Prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gcloud ai-platform versions create v1 --model=natality_model --framework=tensorflow --runtime-version=1.15 --origin=$MODEL_PATH"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# PART 3: Run online prediction using the Python REST client\n",
    "This section of the tutorial will first create a BQ Python client to query the \"prediction\" dataset into a dataframe.  A slice of the dataframe is then made into an online request using the REST API to AI Platform Prediction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.cloud import bigquery\n",
    "client = bigquery.Client()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create the dataframe to load the \"prediction\" dataset from BQ"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sql = \"\"\"\n",
    "    SELECT\n",
    "      is_male,\n",
    "      gestation_weeks,\n",
    "      mother_age\n",
    "    FROM\n",
    "      `natality.input_view`\n",
    "    WHERE\n",
    "      datasplit = 'prediction'\n",
    "    LIMIT 100\n",
    "\"\"\"\n",
    "\n",
    "df = client.query(sql).to_dataframe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Take a small slice of the dataframe and conver to a list of dictionaries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "request=df.to_dict(orient='records')[2:7]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "request"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Install API client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --upgrade google-api-python-client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import googleapiclient.discovery"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "service = googleapiclient.discovery.build('ml','v1')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "name = 'projects/{}/models/natality_model/versions/{}'.format(PROJECT_ID, MODEL_VERSION)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Execute online prediction from AI Platform"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response=service.projects().predict(\n",
    "    name=name,\n",
    "    body={'instances':request}\n",
    ").execute()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Clean Up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gcloud ai-platform versions delete v1 --model=natality_model --quiet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gcloud ai-platform models delete natality_model --quiet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!bq rm -r -f natality"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gsutil rm -r $MODEL_BUCKET"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "environment": {
   "name": "tf-cpu.1-15.m48",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/tf-cpu.1-15:m48"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
